-
Notifications
You must be signed in to change notification settings - Fork 82
Promote MAAP staging hubs to prod #7219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merging this PR will trigger the following deployment actions. Support deployments
Staging deployments
Production deployments
|
|
@bsatoriu the images should be pointing to OPS not DIT for prod and we want to add QGIS back in |
I added you @grallewellyn |
… bucket names, added qgis image back
|
Thanks, Brian! I made the necessary updates |
yuvipanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To prevent drift between staging and prod, we want to keep config in common.yaml as much as possible. So the workflow can be:
- Test thing in staging via staging specific config
- When they are ready, move them to common.yaml so that's the behavior of both staging and prod
This way we minimize the config that's prod specific, and can use staging to validate issues.
Can you move most of the config to common.yaml than prod.yaml and i'll merge?
This is complete. |
|
@bsatoriu i see the changes in prod.yaml still |
|
Cross posting from Slack, after @bsatoriu nudged me to point out that the changes to prod are actually required. The primary difference between staging and prod after your latest changes is:
In our experience, (2) is often used to 'test' new images before they get rolled out. This is often better achieved by having people type in the image tag into the 'unlisted image' option in prod, and keep staging and prod have the exact same images instead. This lets people test new images in prod without affecting others, and makes sure staging and prod are as close a match as possible, rather than using staging as almost a 'development' instance. In the future for example, if you're experimenting with s3fuse on staging, making sure the images are the same with prod cuts down on a lot of potential issues when ramping up. So my suggestion is:
|
| extraEnv: | ||
| SCRATCH_BUCKET: s3://maap-scratch-prod/$(JUPYTERHUB_USER) | ||
| MAAP_API_HOST: api.maap-project.org | ||
| DOCKERIMAGE_PATH_DEFAULT: mas.maap-project.org/root/maap-workspaces/custom_images/maap_base:v5.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will set this as the environment variable no matter what image is used. Is that what was expected? In the last PR, I saw this was set to be the same as the name of the image, in which case it should use $(JUPYTER_IMAGE) as the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, extracted out DOCKERIMAGE_PATH_BASE_IMAGE!
| WORKSPACE_BUCKET: maap-ops-workspace | ||
| nodeSelector: | ||
| 2i2c/hub-name: prod | ||
| profileList: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, we would like to keep profileLists in common.yaml, and use the same image in staging and prod. The staging here is primarily for testing infrastructure changes, and we (2i2c) would like to generally keep it the exact same as prod. So that if we have tested something on staging, we're 99% confident it would work in prod.
having different images in staging and prod could cause problems here, in case the images being different causes failure when migrating. It could also cause the other parts of profile Lists (such as resource config) to drift out of sync between these two.
However, we also recognize that you want to probably test out different images as you're onboarding an existing userbase to this hub, and want to be flexible.
So I see two paths forward:
- Use the same image tags for staging and prod, and put it in common.yaml. Image testing happens purely via unlisted choice. This is the preferred way, and also where we should go long term.
- If (1) doesn't fit with your existing workflows for building images, leave a block comment above the
profileListconfig in staging and prod, documenting that it's duplicated, and that whoever is modifying it should take care to make sure that the only differences between these two should be the image tags, and everything else should be kept in sync manually. We can then revisit this in 3-6 months, after the initial migration is completed and the pace of image changes has changed.
I wanna unblock y'all asap, so while I have a preference for (1) happy to do either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry jumping into this conversation as I come back from leave.
Let me know if I am phrasing this correctly -
You are saying that staging and prod are meant for infrastructure testing and everything else remains the same. In that case, we (MAAP) as tenants of this infrastructure should be deploying 3 versions of your prod configuration for our own customers and venues (DIT, UAT and OPS). The tenant should not need to worry about your changes in your staging environment.
We should be able to deploy multiple 2i2c prod environments with different MAAP configurations for our testing.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On MAAP, the DIT, UAT and OPS venues come with their associated deployments of the API and data processing clusters which impact the jupyter extensions used in the images. So in terms of testing, we are not just testing the images, but also entire the deployment venue which is isolated in its own cloud env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a block comment above profileList and we would like to go with option 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @grallewellyn! I've retitled the PR slightly and merged this!
@sujen1412 I opened #7233 to split off the other conversation so we don't lose track of it!
|
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/19914343488 |
|
The deployment is failing because the image This is another reason to keep prod and staging images the same - since staging is supposed to catch issues that affect prod. Here staging has caught an issue, and we no longer know if it'll affect prod or not (and vice versa - staging may succeed but prod may fail). So in the long run, each production environment should have its own staging where the images are the same. |
|
Since the only difference between staging and production is the image tags, the only way the pipeline could fail is if the image doesn't exist. If the image doesn't exist, we can quickly push it and rerun the deployment If there is something wrong with the images, then it won't launch in 2i2c but that is a different issue |
Synchronize the latest maap
staging.values.yamlupdates toprod.values.yaml.